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Method, device, encoder apparatus, decoder apparatus and audio system 



The present invention relates to a method and device for processing a stereo 
signal obtained from an encoder, which encoder encodes an N-channel audio signal into left 
and right signals and spatial parameters. The invention also relates to an encoder apparatus 
comprising such an encoder and such a device. 
5 The present invention also relates to a method and device for processing a 

stereo signal obtained by such a method and such a device for processing a stereo signal 
obtained from an encoder. The invention also relates to a decoder apparatus comprising such 
a device for processing a stereo signal. 

The present invention also relates to an audio system comprising such an 
10 encoder apparatus and such a decoder apparatus. 

For a long time, stereo reproduction of music, for example in home 
environment has been prevailing. During the 1970's, some experiments were done with four 
15 channel reproduction of home music equipment. 

In larger halls, such as film theatres, multi-channel reproduction of sound has 
been present for a long time. Dolby Digital® and other systems were developed for providing 
realistic and impressive sound reproduction in a large hall. 

Such multi-channel systems have been introduced in the home theatre and are 
20 gaining large interest. Thus, systems having five full-range channels and one part-range 

channel or low-frequency effects (LFE) channel, so called 5.1 systems, are today common on 
the market. Other systems also exist, such as 2.1, 4.1, 7.1 and even 8.1. 

With the introduction of SACD and DVD, multi-channel audio reproduction is 
gaining further interest. Many consumers already have the possibility of multi-channel 
25 playback in their homes, and multi-channel source material is becoming popular. 

Because of increased popularity of multi-channel material, efficient coding of 
multi-channel material is becoming more important, which is also recognized by 
standardization bodies such as MPEG. 
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Previously known encoders often do not apply efficient methods to encode 
multi-channel audio. The input channels may be basically encoded individually (possibly 
after matrixing), thus requiring a high bit rate due to the large number of channels. 

However, a multi-channel audio encoder may generate a 2-channel down-mix 
5 which is compatible with 2-channel reproduction systems, while still enabling high-quality 
multi-channel reconstruction at the decoder side. The high-quality reconstruction is 
controlled by transmitted parameters P which control the stereo-to-multi-channel upmix 
process. These parameters contain information describing, amongst others, the ratio of front 
versus surround signal which is present in the 2-channel down mix. Using such an approach, 
10 a decoder can control the amount of front versus surround signal in the upmix process. In 
other words, the parameters describe important properties of the spatial sound field which 
was present in the original multi-channel signal, but which is lost in the stereo mix due to the 
down-mix process. 

The current invention relates to the possibility to use this parameterized spatial 
15 information to apply parameter-dependent, preferably invertible, post-processing on a 2- 
channel down-mix to enhance the downmix, such as the perceptual quality or spatial 
properties thereof. 



20 An object of the present invention is to make post-processing of the down-mix 

possible after encoding, based upon the parameters as determined in the multi-channel 
encoder and still maintain the possibility of multi-channel decoding without influences of the 
post-processing. 

This object is achieved by a method and a device for processing a stereo signal 
25 obtained from an encoder, which encoder encodes an N-channel (N>2) signal into left and a 
right signals and spatial parameters. The method comprises processing of said left and right 
channel signals in order to provide processed signals. The processing is controlled in 
dependence of said spatial parameters. The general idea is to use the spatial parameters 
obtained from an N-channel-to-stereo coder to control a certain post-processing algorithm. In 
30 this way, the stereo signal obtained from the encoder may be processed, for example for 
enhancing the spatial impression. 

In an embodiment of the invention, the processing is controlled by a first 
parameter for each input channel, i.e. for each of the left and right signals, which first 
parameter is dependent on the spatial parameters. The first parameter may be a function of 
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time and/or frequency. Thus, the system may have a variable amount of post-processing of 
which the actual amount of post-processing depends on the spatial parameters. The post- 
processing may be performed individually in different frequency bands. The encoder delivers 
independent spatial parameters describing the spatial image for a set of frequency bands. In 
5 that case, the first parameter may be frequency-dependent. 

In another embodiment of the invention, the post-processing comprises adding 
a first, second and third signal in order to obtain said processed channel signals. The first 
signal includes the first input signal, i.e. the left or right signal, modified by a first transfer 
function, the second signal includes the first input signal modified by a second transfer 

10 function, and the third signal includes the second input signal, i.e. the right or left signal, 
modified by a third transfer function. The second transfer function may comprise said first 
parameter and a first filter function. The first transfer function may comprise a second 
parameter, whereby the sum of said first parameter and said second parameter can be unity. 
The third transfer function may comprise said first parameter of the second input signal and a 

15 second filter function. 

The filter functions may be time- invariant. 

In one specific embodiment, the signals may be described by the equation: 

KJ KJ L M"> (\-kY+Mh 4 \ 

with a being a constant. 

20 Using this representation, the filtering effect of the filter functions Hi, H 2 , H3 

and H 4 is variable by varying the parameters wi and w r . If both parameters have values equal 
to zero, the post-processed signals Lo w , Row are essentially equal to the stereo input signal 
pair L 0 , Ro. On the other hand, if the parameters are +1, the post-processed stereo pair L 0w , 
Row is fully processed by the filter functions Hi, H2, H3 and H 4 . This invention makes 

25 possible to control the actual amount of filtering, i.e., the value of the parameters wi and w r 
by the spatial parameters P. 

According to an embodiment, the filter functions and parameters are selected 
so that the transfer function matrix is invertible. This makes reconstruction of the original 
stereo signal possible. 

30 In another aspect of the invention, it comprises a device for processing a stereo 

signal in accordance with the above mentioned methods, and an encoder apparatus 
comprising such a device. 
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In another aspect of the invention there is provided a method and a device for 
inverting the processing in accordance with the above mentioned methods, and a decoder 
apparatus comprising such an inverting device. 

In yet another aspect of the invention there is provided an audio system 
5 comprising such an encoder apparatus and such a decoder apparatus. 



Further objects, features and advantages of the invention will appear from the 
following detailed description of the invention with reference to embodiments thereof and 
10 with reference to the appended drawings, in which: 

Fig. 1 shows a schematic block diagram of an encoder/decoder audio system 
including post-processing and inverse post-processing according to the present invention. 

Fig. 2 shows a detailed block diagram of an embodiment of a device for post- 
processing a stereo signal obtained from a multichannel encoder. 
15 Fig. 3 shows a block diagram of another embodiment of the device for post- 

processing a stereo signal obtained from a multichannel decoder. 

Fig. 4 shows a block diagram of an embodiment of the for inversely post- 
processing a stereo signal comprising left and right signals. 

20 

Fig. 1 is a block diagram of an encoder/decoder system in which the present 
invention is intended to be used. In the audio system 1 an N-channel audio signal is supplied 
to an encoder 2, with N being an integer which is larger than 2. The encoder 2 transforms the 
N-channel audio signals to signals Lo and Ro and parametric decoder information P 9 by 

25 means of which a decoder can decode the information and estimate the original N-channel 
signals to be output from the decoder. The spatial parameter set P is preferably time and/or 
frequency dependent. The N-channel signals may be signals for a 5.1 system, comprising a 
center channel, two front channels, two surround channels and an LFE channel. 

The encoded stereo signal pair Lo and Ro and decoder spatial information P, 

30 are transmitted to the user in a suitable way, such as by CD, DVD, VHS Hi-Fi, broadcast, 
laser disc, DBS, digital cable, Internet or any other transmission or distribution system, 
indicated by the circle line 4 in Fig. 1 . Since the left and right signals are transmitted, the 
system is compatible with the vast number of receiving equipment that can only reproduce 
stereo signals. If the receiving equipment includes a decoder, the decoder may decode the N- 
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channel signals and provide an estimate thereof, based on the information in the stereo signal 
pair L 0 and Ro as well as the decoder spatial information signals or spatial parameters P. 

However, due to the decreased number of playback signals, stereo signals are 
lacking spatial information compared to the N-channel signals or other properties that may be 
5 desired for certain situations. Thus, according to the present invention, there is provided a 

post-processor 5 which processes the stereo signal prior to the transmission/distribution to the 
receiver. The post-processing may be position-dependent "addition" of bass or reverberation, 
or removal of vocals (karaoke with vocals in center channel). 

Other examples of post -processing are stereo-base-widening, which may be 

10 performed by making use of the knowledge of the composition of the original surround mix, 
such as front/back, since the contribution of individual input signals is known from the 
decoder information signals P. In principle, stereo widening can be applied already in the 
encoder, but this is generally not invertible, since only two signals are available in the 
decoder, instead of N, inversion is generally impossible. But besides stereo widening, also 

15 other post-processing techniques on the individual multi-channel contributions are possible. 

According to the invention, the post-processed signals are transmitted to a 
receiver as indicated by the circle 6 in Figure 1 . The inventive device for processing a stereo 
signal obtained from an encoder comprises the post-processor 5. The encoder apparatus 
according to the present invention comprises the encoder 2 and the post-processor 5. 

20 The signal received may be used directly, for example if the receiver does not 

include a multi-channel decoder. This may be the case in a computer receiving the signal 6 
over the Internet, or in a receiver having only two loudspeakers. Such received signal is 
perceived as a high quality signal, since it has improved spatial impression or other 
characteristics as determined in the processing thereof by the encoder and the post-processor. 

25 If the signal should be used for decoding in a conventional N-channel decoder 

3, it must first be inverse post-processed by an inverse post-processor 7, in order to 
reconstruct the original stereo signal pair L 0 and R 0 which together with the decoder 
information or spatial parameters P 9 produces an estimated N-channel signal. According to 
the invention, such reconstruction is possible of the multi-channel mix, which reconstruction 

30 is hardly affected by the post-processing. Also post-processing in the decoder is possible for 
stereo playback as a user-selectable feature, without the necessity to determine the multi- 
channel signal first. The inventive device for processing a stereo signal comprising left and 
right signals comprises the inverse post-processor 7. The decoder apparatus according to the 
present invention comprises the decoder 3 and the inverse post-processor 7. 
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Without post-processing the down-mix is comparable with a standard ITU 
down-mix. The inventive method, however, may improve the down-mix significantly. 

The inventive method is able to determine the contribution in the down-mix of 
the original channels in the multi-channel mix with the help of the determined spatial 
parameters P in the encoder. In this way post-processing can be applied to specific channels 
of the multi-channel mix, for example stereo-base-widening of the rear channels, whilst the 
other channels are not affected. The post-processing does not affect the final multi-channel 
reconstruction if the post-processing is invertible. It can also be applied for an improved 
stereo playback without the necessity to reconstruct the multi-channel mix first. 

This method differs from existing post-processing techniques in that it uses the 
knowledge of the original multi-channel mix, i.e. the determined spatial parameters P. 

The encoder 2 operates in the following way: 

Assume an N-channel audio signal as an input signal to the encoder 2, where 
zi[n], z 2 [n],....z N [n] describe the discrete time-domain waveforms of the N channels. These N 
signals are segmented using a common segmentation, preferably using overlapping analysis 
windows. Subsequently, each segment is converted to the frequency domain using a complex 
transform (e.g., FFT). However, complex filter-bank structures may also be appropriate to 
obtain time/frequency tiles. This process results in segmented, sub-band representations of 
the input signals which will be denoted by, Z\ [k], Z 2 [k],. . . ., Z N [k], with k denoting the 
frequency index. 

From these N channels, 2 down-mix channels are created, being Lo[k] and 
Ro[k]. Each down-mix channel is a linear combination of the N input signals: 

£o[*] = 2>iZ,[*] 

* 0 [*] = XAz,[*]. 

The parameters a* and Pi are chosen such that the stereo signal consisting of 
L 0 [k] and Ro[k] has a good stereo image. In case of a 5-channel input signal consisting of Lf, 
Rf, C, L s , and R s (for the left-front, right-front, center, left-surround, right-surround channels, 
respectively), a suitable downmix can be obtained according to: 

L 0 [k] = L[k] + C[k]/^ 

R 0 [k] = R[k\ + C[ky 

The signals L and R can be obtained according to the equations: 
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L [k] = L f [k] + L s [k]/j2 
R [k] = R f [k] + R s [k]/j2 

Additionally, spatial parameters P are extracted to enable perceptual 
reconstruction of the signals Lf, Rf, C, L s and R s from Lo and Ro. 

In an embodiment, the parameter set P includes inter-channel intensity 
differences (IIDs) and possibly inter-channel cross-correlation (ICCs) values between the 
signal pairs (Lf, L s ) and (Rf, R s ). The IID and ICC between the Lf, Ls pair are obtained 
according to the equations: 

^L f [k]L' f [k] 

IID L = 



k 

^L f [k]L][k] 



ICC L = 9? 



Here, (*) denotes the complex conjugation. For other signal pairs, similar 
equations can be used. Thus, the parameter IIDj describes the relative amount of energy 
between the left- front and left-surround channels and the parameter ICQ describes the 
amount of mutual correlation between the left-front and left-surround channels. These 
parameters essentially describe the perceptually relevant parameters between front and 
surround channels. 

A parameterization of the amount of center signal which is present in Lo, Ro 
can be obtained by estimating two prediction parameters ci and C2. These two prediction 
parameters define a 2x3 matrix which controls the decoder upmix process from L 0 , Ro to L, 
C, and R: 



An implementation of the upmix matrix M is given by: 



M = 



c 2 -\ 



c,-l 



1 — Cj 1 — c 2 



For the example shown above, the parameter set P includes {ci, C2, IIDi, ICQ, 
IID r , ICQ} for each time/frequency tile. 
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On the resulting stereo signal pair (L 0 , Ro), post-processing can be applied in a 
way that it mainly affects the contribution of Zi[k], for example L s and R s in the stereo mix. 
In Fig. 1 the position of this block in the codec is shown. 

Fig. 2 is a detailed view of the post-processor 5 in Fig. 1 according to an 
5 embodiment of the invention. The post-processed left signal L 0w is the sum of three signals, 
namely the left signal L 0 modified by a transfer function H A , the left signal L 0 modified by a 
transfer function H B and the right signal Ro modified by a transfer function H D . In the same 
way, the post-processed right signal Row is the sum of three signals, namely the right signal 
Ro modified by a transfer function Hf, the right signal Ro modified by a transfer function H E 
10 and the left signal L 0 modified by a transfer function H c . The transfer functions H A - H F may 
be implemented as FIR or IIR-type filters, or can simply be (complex) scale factors which 
may be frequency dependent. Furthermore, the transfer function H A may be a multiplication 
with a second parameter (1-wi) and transfer function H B may include a first parameter wi 
whereby this parameter wi determines the amount of post-processing of the stereo signal. 
15 This is shown in Fig. 3. The parameter wj determines the amount of post- 

processing of L 0 [k] and w r of Ro[k]. When wj is equal to 0, L 0 [k] is unaffected, and when wi 
is equal to 1, L 0 [k] is maximally affected. The same holds for w r with respect to Ro[k]. 

The following equations hold for the post-processing parameters wi and w r : 

Wi = fi(IIDi,ICC,,cl,c2) 
20 w r -f r (IID r ,ICC r ,cl,c2) 

The blocks Hi, H 2 , H 3 and H 4 in Fig. 3 are filter functions, which can be 
various types of filters, for example stereo widening filters, as shown below. 

The resulting outputs are: 

(i- Wl y + ( Wl y H] (w,Yh 3 1 

(w,) a H 2 (l-w,Y+{w,YH 4 \ 

25 with a an arbitrary constant (e.g., +1). 

If the filter functions Hi, H2, H3 and H4 are chosen properly, the transfer 
function matrix H can be inverted. Moreover, to enable computation of the inverse matrix at 
the decoder side, the filter functions Hi, H2, H3 and H 4 and parameters wi and w r should be 
known at the decoder. This is possible since wj and w r can be calculated from the transmitted 

30 parameters. Thus, the original stereo signal L 0 , Ro will be available again which is necessary 
for decoding of the multi-channel mix. 



Low 


= H 


"V 


- R Ow_ 




, R o_ 



in which: H = 
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Another possibility is to transmit the original stereo signal and apply the 
post-processing in the decoder to make improved stereo playback possible without the 
necessity to determine the multi-channel mix first. 

Below, an embodiment of the post-processing is described in detail. However, 
5 the invention is not limited to the exact details but may be varied within the scope of 
invention as defined in the appended patent claims. 

The post-processing parameters or weights wi and w r are a function of the 
transmitted spatial parameters: 

(w l5 w r ) = f(P) 

10 The function f is designed in such a way that wj increases if the signal Lo 

contains more energy from the left-surround signal compared to the left-front or center 
signals. In a similar way, w r increases with increasing relative energy of the right-surround 
signal present in Ro- A convenient expression for wi and w r is given by: 

w^Mc^MIID,) 
w r =f x {c 2 )f 2 {IID r ) 



15 with 



fx(*Y 



2x-l for 0.5<x<l 
0 for x < 0.5 
1 for x > 1 



and 



/ a (*) = 



1 + x 

For the filter functions Hi, H 2 , H 3 , and H 4 the following exemplary functions 
20 are then chosen (in the z-domain): 

H,(z) = H 4 (z) = 0.8(1.0 + 0.2z _1 + 0.2z" 2 ) 
H 2 (z) = H 3 (z) = 0.8(-1.0z' - 0.2z" 2 ). 

This invention can be integrated in a multi-channel audio encoder 
apparatus that creates a stereo-compatible down-mix. The general scheme of such a 
25 multi-channel parametric audio encoder which is enhanced by the post-processing 
scheme as described above can be outlined as follows: 

Conversion of the multi-channel input signal to the frequency domain, either 
by segmentation and transform or by applying a filterbank; 

Extraction of spatial parameters P and generation of a down-mix in the 
30 frequency domain; 
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Application of the post-processing algorithm in the frequency domain; 
Conversion of the post-processed signals to the time domain; 

Encoding the stereo signal using conventional coding techniques, such as 
defined in MPEG; 

5 - Multiplexing the stereo bit-stream with the encoded parameters P to form a 

total output bit-stream. 

A corresponding multi-channel decoder apparatus (i.e., a decoder with 
integrated post-processing inversion) can be outlined as follows: 

Demultiplexing the parameter bit-stream to retrieve the parameters P and the 
10 encoded stereo signal; 

Decoding the stereo signal; 

Conversion of the decoded stereo signal to the frequency domain; 

Applying the post-processing inversion based on the parameters P; 

Upmix from stereo to multi-channel output based on the parameters P\ 
15 - Conversion of the multi-channel output to the time domain. 

Since the post-processing and inverse post-processing are performed in 
the frequency domain, the filter functions Hi to H 4 are preferably converted or 
approximated in the frequency domain by simple (real-valued or complex) scale 
factors, which may be frequency dependent. 
20 Those skilled in the art may understand that one or more processing 

stages as outlined above may be combined as a single processing stage. 

Another application of the invention is to apply the post-processing on 
the stereo signal at the decoder-side only (i.e., without post-processing at the encoder 
side). Using this approach, the decoder can generate an enhanced stereo signal from a 
25 non-enhanced stereo signal. 

Extra information can be provided in the bit-stream which signals whether or 
not the post-processing has been done and the parameter functions ft, f2and which filter 
functions Hi, H 2 , H3, and H4 have been used, which enables inverse post-processing. 

A filter function may be described as a multiplication in the frequency domain. 
30 Since parameters are present for individual frequency bands, the invention may be 

implemented as simple, complex gains instead of filters, which are applied individually in 
different frequency bands. In this case, frequency bands of Lo w , Row are obtained by a simple 
(2x2) matrix multiplication from corresponding frequency bands from (L 0 ,Ro). The actual 
matrix entries are determined by the parameters and frequency domain representations of the 
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filter functions H thus consisting of the time-invariant gains H and a time/frequency- variant 
parameter-controlled gains Wi and w r . Because the filters are scalars for each band, inversion 
is possible. 

The post-processing in the encoder can be described by the following matrix 



equation: 



where 



H = 



: tH 



(i- Wl y+( Wl y Hl 
( w ,Yh 2 



(w r )°H 3 
(l-w r Y+( Wr )°H 4 



This matrix equation is applied for each frequency band. The matrix H 
contains of all scalars. The use of scalars makes post-processing and the inverse post- 
processing relatively easy. 

The parameters w l and w r are scalars and functions of the parameter set P. 

These 2 parameters determine the amount of post-processing of the input channels. 

The parameters Hj H 4 are complex filter functions. 

The inversion of this process can also be done by a simple matrix 
multiplication per frequency band. The following equation is applied per frequency band: 



where 



KJ K-J 



1 



'22 



-h 



21 



K J 



The matrix H~ l contains only scalars. The elements of H~ x , k x k 4 , are 

also functions of the parameter set P. When the functions in the matrix H, h u h 22 , and 

the parameters P are know in the decoder, then the post-processing can be inverted. 

A block diagram of an inverse post-processor 3 which performs such inverse 
post-processing is illustrated in Figure 4. 

This inversion is possible when the determinant of the matrix H is not equal to 
zero. The determinant of H is equal to: 

det(//>= h u h 22 - h n h 2l = (1 - w t y(l -w r Y+(l- w } )° w r a H 4 + (1 - w r ) a w t a H x + w?w r a {H x H 4 -H 2 
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When suitable functions h n ^ 2 2 are chosen, det(H) will be unequal zero, 

so the process is invertable. 

It is mentioned that the expression "comprising" does not exclude other 
elements or steps and that "a" or "an" does not exclude a plurality of elements. Moreover, 
5 reference signs in the claims shall not be construed as limiting the scope of the claims. 

Hereinabove, the invention has been described with reference to specific 
embodiments. However, the invention is not limited to the various embodiments 
described but may be amended and combined in different manners as is apparent to a 
skilled person reading the present specification. 



